Thirty Days of Metal — Day 19: Directional Shadows
This series of posts is my attempt to present the Metal graphics programming framework in small, bite-sized chunks for Swift app developers who haven’t done GPU programming before.
If you want to work through this series in order, start here. To download the sample code for this article, go here.
Adding lights to our scene certainly adds an element of realism, but there’s a problem. Unlike in the real world, we don’t get shadows for free. Instead, each triangle is shaded independently without regard for whether there is other geometry between it and the light.
To emulate shadows, we need some way to answer the question, “Does this point actually receive any light?” If the answer is no, the point should be in shadow. But how do we answer this question cheaply and accurately?
Our answer is shadow mapping. Shadow mapping is the procedure of rendering a depth map of the scene from the perspective of a light, then querying this depth map when drawing the scene to determine if a point is illuminated or in shadow. We call this kind of depth map a shadow map.
Here is a visualization of the shadow map we will learn to produce below.
This sounds simple enough, but there are some subtleties in the implementation we will walk through in this arcticle.
Positioning and Orienting Lights
When we were rendering with directional lights without accounting for shadows, we didn’t have a notion of light position, since all of the light’s rays were assumed to be parallel. However, if we want to render a depth map from the light’s perspective, we need to choose a point of view for it.
To extend our light abstraction for shadows and eventually point lights, we will replace the mutable direction member with a full model-to-world transform matrix. From this matrix, we can extract the location and direction of the light.
class Light {
// …
var worldTransform: simd_float4x4 = matrix_identity_float4x4 var position: SIMD3<Float> {
return worldTransform.columns.3.xyz
} var direction: SIMD3<Float> {
return -worldTransform.columns.2.xyz
}
// …
The position of the light is the translational component of its transform, while its direction is the negative-Z axis of its orientation.
Look-At Transforms
Back on day 10, we learned about elementary transformations like translation, rotation, and scaling. We have since extended these operations with basic 3D transformations.
It turns out that there is a slightly more complex transformation type that is exceedingly useful: the look-at transformation.
To construct a look-at matrix, we need three parameters: the target point (what we’re looking at), the from point (where we’re looking from), and the “up” vector (which way is up). From these intuitive parameters, we can build a matrix that positions a camera or light and orients it toward a point of interest.
Mathematically, we follow a simplified form of Gram–Schmidt orthonormalization, first establishing the (negative) Z axis, then finding the X axis, and finally calculating the Y axis. We exploit the fact that the 3D vector cross product of two linearly-independent vectors is a third vector that is perpendicular to both. In code, it looks like this:
extension simd_float4x4 {
init(lookAt at: SIMD3<Float>,
from: SIMD3<Float>,
up: SIMD3<Float>)
{
let zNeg = normalize(at - from)
let x = normalize(cross(zNeg, up))
let y = normalize(cross(x, zNeg))
self.init(
SIMD4<Float>(x, 0),
SIMD4<Float>(y, 0),
SIMD4<Float>(-zNeg, 0),
SIMD4<Float>(from, 1)
)
}
}With this look-at utility initializer, we can now more easily position our cameras and lights, without having to tediously combine matrices.
Shadow Projection Matrices
Just as we need a view matrix and projection matrix when rendering from the point of view of a camera, we need a view matrix and projection matrix for a shadow-casting light. We can find the light’s “view” matrix by taking the inverse of its world transform, the same way we do for a camera.
Since directional lights’ rays are all parallel, we will use our old friend the orthographic projection as our projection transform.
Determining the bounds of the view volume is a notorious problem full of trade-offs. To best utilize the texels in the shadow map, we want to tightly bound the scene with the view volume. If we don’t correctly select the top, left, bottom, and right parameters of the projection, we might not capture portions of the scene in the shadow map, causing us to later erroneously conclude they are in shadow when they should be illuminated (or vice versa). If we don’t select the near and far Z values carefully, we waste the limited precision of the depth range, causing blocky, jagged shadow edges.
For the time being, we will keep things simple and use a fixed projection matrix that happens to work well for our sample scene, but if we were writing a general-purpose renderer, we’d need to design heuristics for shaping the projection matrix for precision and accuracy.
We add a projectionMatrix member to our Light class to compute the projection transform.
var projectionMatrix: simd_float4x4 {
return simd_float4x4(orthographicProjectionWithLeft: -1.5,
top: 1.5,
right: 1.5,
bottom: -1.5,
near: 0,
far: 10)
}Creating Depth Textures
To the Light class we will also add a member that indicates if a light is a shadow-casting light, and an optional texture member to store the shadow map itself.
var castsShadows = false
var shadowTexture: MTLTexture?Here is the complete configuration of our main directional light:
sunLight = Light()
sunLight.type = .directional
sunLight.worldTransform = simd_float4x4(
lookAt: SIMD3<Float>(0, 0, 0),
from: SIMD3<Float>(1, 1, 1),
up: SIMD3<Float>(0, 1, 0))
sunLight.castsShadows = trueWe use a fixed shadow map resolution of 2048×2048. To create a shadow map, we first populate a Metal texture descriptor with a pixel format of MTLPixelFormat.depth32Float, a format in which each texel is a single-precision float depth value.
let shadowMapSize = 2048
let textureDescriptor =
MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .depth32Float,
width: shadowMapSize,
height: shadowMapSize,
mipmapped: false)
textureDescriptor.storageMode = .private
textureDescriptor.usage = [ .renderTarget, .shaderRead ]We set the storage mode to MTLStorageMode.private, which means that the shadow texture will only be allocated on the GPU and not accessible to the CPU. We set its usage to a combination of MTLTextureUsage.renderTarget and MTLTextureUsage.shaderRead, since we will first render to the shadow map to store the depth values of the scene from the light’s perspective, then sample it in the main pass to determine which fragments are illuminated and which are in shadow.
We can then ask the device for a texture and store it in our shadow-casting light:
sunLight.shadowTexture = device.makeTexture(descriptor: textureDescriptor)Multipass Rendering
So far, we have only encoded one pass per frame, drawing everything directly into a view’s drawable texture. However, sometimes we need to render to different textures to achieve certain effects. Shadow mapping is one such effect.
Render passes are delimited by render command encoders. In other words, whenever we want to start a new pass, we must construct a new render pass descriptor and create a new render command encoder. Command buffers can contain any number of render passes, but any time we need to change render attachments (textures), we must start a different pass. We call encoding more than one pass per frame “multipass rendering.”
To accommodate multipass rendering, we will refactor our draw method to call one member method per pass, first drawShadows(), then drawMainPass():
let commandBuffer = commandQueue.makeCommandBuffer()!drawShadows(light: sunLight, commandBuffer: commandBuffer)drawMainPass(renderPassDescriptor: renderPassDescriptor, commandBuffer: commandBuffer)commandBuffer.present(view.currentDrawable!)
commandBuffer.commit()
Render Pass Descriptors for Shadow Mapping
To perform our shadow mapping pass, we first need to create a render pass descriptor and configure it to store depth values. To do so, we set our light’s shadow texture as the texture of the descriptor’s depth attachment:
let renderPassDescriptor = MTLRenderPassDescriptor()
renderPassDescriptor.depthAttachment.texture = light.shadowTextureTo ensure the depth texture is cleared to a known value at the start of the pass, we set its load action to MTLLoadAction.clear and set its clear depth to 1.0, signifying an infinite distance.
renderPassDescriptor.depthAttachment.loadAction = .clear
renderPassDescriptor.depthAttachment.clearDepth = 1.0Finally, we set the depth attachment’s storage action to MTLStorageAction.store so the results of the shadow pass are available for the main pass.
renderPassDescriptor.depthAttachment.storeAction = .storeRendering then proceeds as normal. We construct a model-view-projection for each node from the light’s view and projection matrices and the node’s world matrix.
Shadow Mapping Vertex Function and Render Pipeline
The shader functions for shadow mapping are remarkably simple. In fact, there is no fragment function at all: the rasterized depth is written directly to the shadow map and we don’t need to write any custom code. The vertex function is trivial: we just multiply the vertex’s model position by the composed model-view-projection matrix to produce a clip space position:
vertex float4 vertex_shadow(
VertexIn in [[stage_in]],
constant float4x4 &modelViewProjectionMatrix [[buffer(2)]])
{
return modelViewProjectionMatrix * float4(in.position, 1.0);
}To turn this vertex function into a render pipeline, we populate a render pipeline descriptor as usual, omitting the fragment function:
let shadowRenderPipelineDescriptor = MTLRenderPipelineDescriptor()
shadowRenderPipelineDescriptor.vertexDescriptor = vertexDescriptor
shadowRenderPipelineDescriptor.depthAttachmentPixelFormat = .depth32Float
shadowRenderPipelineDescriptor.vertexFunction = library.makeFunction(name: "vertex_shadow")do {
shadowRenderPipelineState = try device.makeRenderPipelineState(descriptor: shadowRenderPipelineDescriptor)
} catch {
fatalError("Error while creating render pipeline state: \(error)")
}
Note that we can reuse the same vertex descriptor that we use for the main pass; we just don’t actually use any attributes other than the model-space position.
Using Shadow Maps
In the main pass, we will use the shadow map to determine which fragments are in shadow. We do this by first determining the fragment’s depth from the light’s perspective. If this depth is less than the value stored in the shadow map, the point is illuminated; otherwise it is in shadow.
To construct the fragment’s light-space position, we need to add a view-projection matrix to our light constants:
struct Light {
float4x4 viewProjectionMatrix;
float3 intensity;
float3 direction;
LightType type;
};We can then adapt our directional lighting shader code to determine the degree to which we are in shadow and use this to modulate our lighting:
float shadowFactor = 1 - shadow(in.worldPosition, shadowMap, light.viewProjectionMatrix);
// …
diffuseFactor = shadowFactor * saturate(dot(N, L));
specularFactor = shadowFactor * powr(saturate(dot(N, H)), specularExponent);The real meat of the algorithm is in the shadow() function, so let’s step through that, starting with the function signature:
static float shadow(float3 worldPosition,
depth2d<float, access::sample> depthMap,
constant float4x4 &viewProjectionMatrix)
{The world position parameter is the world position of the current fragment, provided by the rasterizer. The depth map parameter is the light’s shadow texture, and the view-projection matrix is the light’s view-projection matrix.
To determine the depth of the closest fragment to the light, we transform the fragment into the light’s clip space, then manually perform a perspective divide to move into NDC. From there, we determine the shadow map texture coordinates that correspond to the fragment:
float4 shadowNDC = (viewProjectionMatrix * float4(worldPosition, 1));
shadowNDC.xyz /= shadowNDC.w;
float2 shadowCoords = shadowNDC.xy * 0.5 + 0.5;
shadowCoords.y = 1 - shadowCoords.y;To compare the shadow depth and the fragment depth, we will sample the shadow map at these coordinates. Rather than taking a sampler state, we can use a constexpr sampler configured specially for depth map comparison:
constexpr sampler shadowSampler(
coord::normalized,
address::clamp_to_edge,
filter::linear,
compare_func::greater_equal);To perform the depth comparison, we use the sample_compare function on the depth texture, passing the fragment’s shadow map coordinates and the depth value to compare to.
float shadowCoverage = depthMap.sample_compare(shadowSampler, shadowCoords, shadowNDC.z);
return shadowCoverage;
}Since our sampler is configured with the “greater-or-equal” comparison function, the result will be 0 if the fragment is closer to the light than the value in the shadow map, and 1 otherwise. A value of 1 therefore means the fragment is in shadow. We subtract the result from 1 in the vertex shader to get the value by which we modulate the lighting.
At this point, we’ve implemented shadow mapping. If we run the sample app at this point, though, we see something awful.
Shadow Acne
The checkered or banded artifacts produced by shadow mapping are called “shadow acne.”
If our shadow map had infinite resolution and precision, our approach so far would probably work just fine. However, since both are limited, it’s possible for the depth comparison to produce false positives, considering fragments in shadow when they should be lit. This happens when the quantized depth sampled from the shadow map is just barely farther than the fragment depth.
The simplest way to address shadow acne is with a depth bias. We subtract a tiny value from the projected depth, which causes the edge cases that produce shadow acne to resolve negatively instead of positively for shadowing.
Here is the modified portion of the shadow function:
const float depthBias = 5e-3f;
float shadowCoverage = depthMap.sample_compare(
shadowSampler,
shadowCoords,
shadowNDC.z - depthBias);Implementing this depth bias resolves the issues for this scene, but the actual value of the depth bias depends on various factors such as the projection transform and the precision and resolution of the shadow map. If the bias needs tuning, it can be passed as a per-light parameter rather than using a hardcoded constant.
Here is our completed sample app with directional shadows:
In the next article we will look at multisampled antialiasing, a technique for producing smoother images with relatively little processing overhead.